Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction
نویسنده
چکیده
When used to learn high dimensional parametric probabilistic models, the classical maximum likelihood (ML) learning often suffers from computational intractability, which motivates the active developments of non-ML learning methods. Yet, because of their divergent motivations and forms, the objective functions of many non-ML learning methods are seemingly unrelated, and there lacks a unified framework to understand them. In this work, based on an information geometric view of parametric learning, we introduce a general non-ML learning principle termed as minimum KL contraction, where we seek optimal parameters that minimizes the contraction of the KL divergence between the two distributions after they are transformed with a KL contraction operator. We then show that the objective functions of several important or recently developed non-ML learning methods, including contrastive divergence [12], noise-contrastive estimation [11], partial likelihood [7], non-local contrastive objectives [31], score matching [14], pseudo-likelihood [3], maximum conditional likelihood [17], maximum mutual information [2], maximum marginal likelihood [9], and conditional and marginal composite likelihood [24], can be unified under the minimum KL contraction framework with different choices of the KL contraction operators.
منابع مشابه
Loss-sensitive Training of Probabilistic Conditional Random Fields
We consider the problem of training probabilistic conditional random fields (CRFs) in the context of a task where performance is measured using a specific loss function. While maximum likelihood is the most common approach to training CRFs, it ignores the inherent structure of the task’s loss function. We describe alternatives to maximum likelihood which take that loss into account. These inclu...
متن کاملDistributed Estimation, Information Loss and Exponential Families
Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole d...
متن کاملA Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relat...
متن کاملModified Maximum Likelihood Estimation in First-Order Autoregressive Moving Average Models with some Non-Normal Residuals
When modeling time series data using autoregressive-moving average processes, it is a common practice to presume that the residuals are normally distributed. However, sometimes we encounter non-normal residuals and asymmetry of data marginal distribution. Despite widespread use of pure autoregressive processes for modeling non-normal time series, the autoregressive-moving average models have le...
متن کاملمقایسه روشهای طبقهبندیکننده حداکثر مشابهت و حداقل فاصله از میانگین در تهیه نقشه پوشش اراضی (مطالعه موردی: استان اصفهان)
Land cover maps derived from satellite images play a key role in regional and national land cover assessments. In order to compare maximum likelihood and minimum distance to mean classifiers, LISS-III images from IRS-P6 satellite were acquired in August 2008 from the western part of Isfahan. First, the LISS-III image was georeferenced. The Root Mean Square error of less than one pixel was the r...
متن کامل